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Abstract. The chase procedure, an algorithm proposed 25+ years ago 
to fix constraint violations in database instances, has been successfully 
apphed in a variety of contexts, such as query optimization, data ex- 
change, and data integration. Its practicability, however, is limited by the 
fact that - for an arbitrary set of constraints - it might not terminate; 
even worse, chase termination is an undecidable problem in general. In 
response, the database community has proposed sufficient restrictions on 
top of the constraints that guarantee chase termination on any database 
instance. In this paper, we propose a novel sufficient termination condi- 
tion, called inductive restriction, which strictly generalizes previous con- 
ditions, but can be checked as efficiently. Furthermore, we motivate and 
study the problem of data- dependent chase termination and, as a key re- 
sult, present sufficient termination conditions w.r.t. fixed instances. They 
are strictly more general than inductive restriction and might guarantee 
termination although the chase does not terminate in the general case. 

1 Introduction 

The chase procedure is a fundamental algorithm that has been successfully ap- 
plied in a variety of database applications [101712161 9 13 5 11 . Originally pro- 
posed to tackle the implication problem for data dependencies [1012) and to op- 
timize Conjunctive Queries (CQs) under data dependencies [117] . it has become 
a central tool in Semantic Query Optimization (SQO) |12l5ll4j . For instance, the 
chase can be used to enumerate minimal CQs under a set of dependencies [5] , thus 
supporting the search for more efficient query evaluation plans. Beyond SQO, 
it has been applied in many other contexts, such as data exchange [13], data 
integration [S] , query answering using views [5] , and probabilistic databases [TT] . 
The core idea of the chase algorithm is simple: given a set of dependencies (also 
called constraints) over a database schema and an instance as input, it fixes 
constraint violations in the instance. One problem with the chase, however, is 
that - given an arbitrary set of constraints - it might never terminate; even worse, 
this problem is undecidable in general, also for a fixed instance [1]. Addressing 
this issue, sufficient conditions for the constraints that guarantee termination 
on any database instance have been proposed [1314114] . Such conditions are the 
central topic in this paper. In particular, we make two key contributions. 
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A novel sufficient termination condition for the chase. We introduce 
the class of inductively restricted constraints, for which the chase terminates in 
polynomial time data complexity. Like existent sufficient termination conditions, 
inductive restriction asserts that there are no positions in the schema where fresh 
labeled nulls might be cyclically created during chase application. It relies on 
a sophisticated study of (a) positions in the database schema where null values 
might appear, (b) subsets of the constraints that cyclically pass null values, and 
(c) connections between such cycles. The combination of these aspects makes in- 
ductive restriction more general than previous sufficient termination conditions, 
thus making a larger class of constraints amenable to the chase procedure. 

Data-dependent chase termination. Whenever inductive restriction does 
not apply to a constraint set, no termination guarantees for the general case can 
be derived. Arguably, reasonable applications should never risk non-termination, 
so the chase algorithm cannot be safely applied to any instance in this case. Tack- 
ling this problem, we study data- dependent chase termination: given constraint 
set E and a fixed instance /, does the chase with 2J terminate on /? This setting 
particularly makes sense in the context of SQO, where the query - interpreted 
as database instance - is chased: typically, the size of the query is small, so the 
"data" part can be analyzed efficiently (as opposed to the case where the input 
is a large database instance). We propose two complemental approaches. 
Our first, static scheme relies on the observation that, when the instance / is 
fixed, we can safely ignore constraints in the constraint set that will never fire 
when chasing /, i.e. if general sufficient termination conditions hold for those 
constraints that might fire on /. As a fundamental result, we show that in general 
it is undecidable if a constraint will never fire when chasing a fixed instance. 
Nevertheless, we provide a sufficient condition that allows us to identify such 
constraints, and derive a sufficient data-dependent termination condition. 

Whenever this static approach fails, our second, dynamic approach comes into 
play: we run the chase and track cyclically created fresh null values in a so-called 
monitor graph. We then fix the maximum depth of cycles in the monitor graph 
and stop the chase when this limit is exceeded: in such a case, no termination 
guarantees can be made. However, the search depth implicitly defines a class 
of constraint-instance pairs for which the chase terminates. It can be seen as a 
natural condition that allows us to stop the chase when "dangerous" situations 
arise. Hence, our approach adheres to situations that might well cause non- 
termination and is preferable to blindly running the chase and aborting after a 
fixed amount of time, or a fixed number of chase steps. Applications might choose 
the maximum search depth following a pay-as-you-go paradigm. Ultimately, the 
combination of static and dynamic analysis allows us to safely apply the chase, 
although no data-independent termination guarantees can be made. 

Structure. We start with some preliminaries in the following section and, in Sec- 
tion [21 continue with a discussion of non-termination and a motivating example 
for data-dependent chase termination. Section 3] introduces inductive restriction, 
our sufficient (data-independent) termination condition. Finally, we present our 
static and dynamic approach to data-dependent chase termination in Section [5l 



2 Preliminaries 



General mathematical notation. The natural numbers N do not include 0. 
For n G N, we denote by [n] the set {1, ...,n}. For a set M, we denote by 2^ 
its powerset. Given a tuple t = (ti,...,f„) we define the tuple obtained by 
projecting on positions 1 < ii < ■ ■ ■ < im < n a.s Pii....,i„^{t) := {U^, . . . ,ti„). 
Databases. We fix three pairwise disjoint infinite sets: the set of constants A, 
the set of labeled nulls Anuii, and the set of variables V. A database schema TZ is 
a finite set of relational symbols {-Ri, i?„}. In the rest of the paper, we assume 
the database schema and the set of constants and labeled nulls to be fixed. A 
database instance / is a finite set of 7^-atoms that contains only elements from 
A U AnuU in its positions. We denote an element of an instance as fact. The 
domain of /, dom{I), is the set of elements from A U AnuU that appear in /. 
We use the term position to denote a position in a predicate, e.g. a three-ary 
predicate R has three positions ,B? ,B? . We say that a variable, labeled null, 
or constant c appears e.g. in a position R^ if there exists a fact i?(c, ...). 
Constraints. Let x, y be tuples of variables. We consider two types of database 
constraints: tuple generating dependencies (TGDs) and equality generating de- 
pendencies (EGDs). A TGD is a first-order sentence a :— Vx{(/){x) — > 3yil^{x,y)) 
such that (a) both (j) and ip are conjunctions of atomic formulas (possibly with 
parameters from A) , (b) ip is not empty, (c) is possibly empty, (d) both (j) and 
ip do not contain equality atoms and (e) all variables from x that occur in ip 
must also occur in (j>. We denote by pos{a) the set of positions in An EGD is a 
first-order sentence a := Vx{4>{x) ^ Xi = Xj), where Xi,Xj occur in 4> and (pis & 
non-empty conjunction of equality-free 7?,-atoms (possibly with parameters from 
A). We denote by pos{a) the set of positions in (p. As a notational convenience, 
we will often omit the V-quantificr and respective list of universally quantified 
variables. For a set of TGDs and EGDs S we set pos{E) := U^e2;PO'S(0- 
Chase. We assume that the reader is familiar with the chase procedure and give 
only a short introduction here, referring the interested reader to [13j for a more 

detailed discussion. A chase step / — ? J takes a relational database instance / 
such that / a (a) and adds tuples (in case of TGDs) or collapses some elements 
(in case of EGDs) such that the resulting relational database J is a model of a (a) . 
If J was obtained from / in that kind, we sometimes also write la © Cq instead 
of J. A chase sequence is an exhaustive application of applicable constraints 
/q "^-^If^ ajuoi ^ ^ TffYiQYB we impose no strict order on what constraint to apply 
in case several constraints are applicable. If this sequence is finite, say Ir being its 
final element, the chase terminates and its result is defined as Ir - The length of 
this chase sequence is r. Note that different orders of application orders may lead 
to a different chase result. However, as proven in [l3j, two different chase orders 
always lead to homomorphically equivalent results, if these exist. Therefore, we 
write for the result of the chase on an instance / under constraints E. It has 
been shown in [101217] that ^ 17. If a chase step cannot be performed (e.g., 
because application of an EGD would have to equate two constants) or in case 
of an infinite chase sequence, the result of the chase is undefined. 



Sample Schema: hasAirport(c_i(i), ilj{c_idl ,cjd2,dist) , r3iil{c_idl,c_id2,dist) 
Constraint Set: S := {ai, Q2, as}, where 

ai : If there is a flight connection between two cities, both of them have an airport: 

f ly(a;i,a;2,y) hasAirport(a::i), hasAirport(a;2) 
02 : Rail-connections are symmetrical: r&il{xi,X2,y) — > r&il{x2,Xi,y) 
Q3 : Each city that is reachable via plane has at least one outgoing flight scheduled: 

fly(a;i,X2,2/i) ^ 3 2:3, j/2 f light (a;2, 2:3, 1/2) 

Fig. 1. Sample Database Schema and Constraints of a Travel Agency. 
3 A Motivating Example 

Non-termination of the chase is caused by fresh labeled null values that are 
repeatedly created when fixing constraint violations. As an example, consider the 
travel agency database in Figure [TJ Predicate hasAirport contains cities that 
have an airport and fly (rail) stores flight (rail) connections between cities, 
including their distance. In addition to the schema, constraints ai-a^ have been 
specified, e.g. might have been added to assert that, for each city reachable 
via plane, the schedule is integrated in the local database. Now consider the CQ 
qi below (in datalog notation, with constant ci and variables xi, X2, yi, 2/2)- 

qi: rf (X2) :- rail(ci,a;i,yi), f ly(xi,a;2,?/2) 

The query selects all cities that can be reached from ci through rail-and-fly. To 
chase gi, we interpret its body as instance I {rail(ci,a;i,?/i),f ly(xi,a;2,?/2)}, 
where ci is a constant and the Xi, yi labeled nulls. We observe that as does not 
hold on /, since there is a flight to city X2, but no outgoing flight from X2- To fix 
this violation, the chase adds a new tuple ti := f ly(x2,cc3,2/3) to /, where X3, 1/3 
are fresh labeled null values. However, in the resulting instance I' :— I IJ {ii}, 
Q3 is again violated (this time for ^3) and in subsequent steps the chase adds 
f Iy(x3,a;4,y4), f Iy(x4,a::5,y5), f ly(a::5,a:6,y6), • • ■ • Clearly, it will never terminate. 
Reasonable applications should not risk non-termination, so for the constraint 
set in Figure [T] termination is in question for all queries, although there might 
be queries for which the chase terminates. Tackling this problem, we propose to 
investigate data-dependent chase termination, i.e. to study sufficient termination 
guarantees for a fixed instance when no general termination guarantees apply. 
We illustrate the benefits of having such guarantees for query 92 below, which 
selects all cities X2 that can be reached from Ci via rail-and-fly and the same 
transport route leads back from X2 to ci (ci is a constant, Xi, yi are variables). 

92: rffr(x2) :- rail(ci,xi,?/i), f ly(a;i,X2,y2), f Iy(x2,a;i,y2), rail(xi,ci,yi) 

Query 52 violates only ai. The chase terminates and transforms (72 into 

q'2: rffr(2;2) :- rail(ci,2;i,?/i), f ly(a;i,X2,y2), f Iy(x2,a;i,y2), rail(2;i,ci,yi), 
hasAirport (xi), hasAirport (a;2) 

The resulting query q'2 satisfies all constraints and is a so-called universal plan [5] : 
intuitively, it incorporates all possible ways to answer the query. As discussed 



in [5], the universal plan forms the basis for finding smaller equivalent queries 
(under the respective constraints), by choosing subqueries of ^^'^ testing if 
they can be chased to a query that is homomorphical to q'2 . Using this technique 
we can easily show that the following two queries arc equivalent to (72- 

g^': rffr(a;2) :- rail(ci,a;i,2;i), f ly(xi,a;2,y2), f ly(a;2,a:i,y2) 

q!{': rffr(x2) :- hasAirport(a;i), rail(ci,a;i,?/i), f ly(a:i,a;2,y2), fly(x2, 0:1,2/2) 

Instead of q2 we thus could evaluate (73 or Q2' ^ which might well be more 
performant: in both q!^ and q!^' the join with rail(xi,ci,?/i) has been elimi- 
nated; moreover, if hasAirport is duplicate-free, the additional join of rail with 
hasAirport in q'r^' may serve as a filter that decreases the size of intermediate 
results and speeds up query evaluation. This strategy is called join introduction 
in SQO (cf. [S]). Ultimately, the chase for q2 made it possible to detect 52 a-nd 
gj'j so it would be desirable to have data-dependent termination guarantees that 
allow us to chase (72 (and q!^, q'2). We will present such conditions in Section [5] 

4 Data-independent Chase Termination 

In the past, sufficient conditions for constraint sets have been developed that 
guarantee chase termination for any instance. One such condition is weak acyclic- 
ity [13], which asserts that there are no cyclically connected positions in the 
constraint set that may introduce fresh labeled null values, by a global study of 
relations between the constraints. In [i] , weak acyclicity was generalized to strat- 
ification, which enforces weak acyclicity only locally, for subsets of constraints 
that might cyclically cause to fire each other. We further generalized stratifica- 
tion to safe restriction in [14j . We start by reviewing its central ideas and formal 
definition, which form the basis for our novel condition inductive restriction. 
Safe Restriction. The idea of safe restriction is to keep track of positions where 
fresh null values might be created in or copied to. As a basic tool, we borrow the 
definition of affected positions from [3]. We emphasize that, in [3], this definition 
has been used in a different context: there, the constraints are interpreted as 
axioms that are used to derive new facts from the database and the problem is 
query answering on the implied database, using the chase as a central tool. 

Definition 1. /5/ Let She a. set of TGDs. The set of affected positions aff(Z') is 
defined inductively as follows. Let tt be a position in the head of an a £ S. 

• If an existentially quantified variable appears in tt, then vr G aff(Z'). 

• If the same universally quantified variable X appears both in position tt, and 
only in affected positions in the body of a, then tt G aff(Z'). □ 

Akin to the dependency graph in weak acyclicity [I^ , we define a safety condition 
that asserts the absence of cycles through constraints that may introduce fresh 
null values. As an improvement, we exhibit the observation that only values 
created due to or copied from affected positions may cause non-termination. We 
introduce the notion of propagation graph, which refines the dependency graph 
from [13] by taking affected positions into consideration. 



Definition 2. Let Z" be a set of TGDs. We define a directed graph called prop- 
agation graph prop(Z') := (aff(i7),_B) as follows. There are two kinds of edges 
in E. Add them as follows: for every TGD Vx{(j){x) 3yip{x,y)) € S and for 
every a: in a; that occurs in ip and every occurrence of a; in (/) in position tti 

• if x occurs only in affected positions in (f> then, for every occurrence of x in 
tJj in position 7r2, add an edge tti — > 1:2 (if it does not already exist). 

• if a; occurs only in affected positions in then, for every existentially quan- 
tified variable y and for every occurrence of y in a position 7r2, add a special 
edge TTi TT2 (if it does not already exist). □ 

Definition 3. A set U of constraints is called safe iff prop(Z') has no cycles 
going through a special edge. □ 

Safety is a sufficient termination condition which strictly generalizes weak acyclic- 
ity and is different from stratification [Tl|. The idea behind safe restriction now 
is to assert safety locally, for subsets of the constraints that may cyclically cause 
each other to fire in such a way that null values are passed in these cycles. 

Definition 4. Let S abe given and P C pos{S). For all a,P e U, we define 
a P iS there are tuples a, 6 and a database instance / s.t. (i) / a(a), (ii) 

/ 1= /3(6), (iii) / — » J, (iv) Ji^ f3(b), (v) / contains null values only in positions 
from P and (vi) there is a null value n g 5 fl AnuU in the head of (3(b). □ 

Informally, a ~<p (3 holds if a might cause (3 to fire s.t., when null values occur 
only in positions from P, (3 copies some null values. We next introduce a notion 
for affected positions relative to a constraint and a set of positions. 

Definition 5. For any set of positions P and a TGD a let aff-cl(Q;, P) be the 
set of positions tt from the head of a such that 

• for every universally quantified variable x in tt: a; occurs in the body of a 
only in positions from P or 

• TT contains an existentially quantified variable. □ 

On top of previous definitions we introduce the central tool of restriction systems. 

Definition 6. A restriction system is a pair (G'(Z'), /), where G'{S) :— {U, E) 
is a directed graph and f : S ^ 2''°*'^^^ is a function such that 

• forall TGDs a and forall (a,/3) G E: aff-cl(a, /(a)) n pos({/3}) C /(/?), 

• forall EGDs a and forall (a,/3) G E: f{a) npos({/3}) C /(/3), and 

• forall a, (3 e S: a < f^^) f3 =^ {a, (3) e E. 

A restriction system is minimal if it is obtained from ((Z", 0),{(a, 0) | a G Z}) 
by a repeated application of the constraints from bullets one to three (until all 
constraints hold) s.t., in case of the first and second bullet, the image of f{(3) is 
extended only by those positions that are required to satisfy the condition. □ 



part(r: Set of TDGs and EGDs) { 
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compute the strongly connected components (as sets of constraints) Ci, . 






of the minimal restriction system of S; 




2 
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if (n —= 1) then 
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if (Ci 7^ S) then return part(Ci); endif 




7 


return {S}; 
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endif 




6 


for i=l to n do D ^ D U part (Ci); endfor 




11: return D; } 





Fig. 2. Algorithm to compute subsets of 2J. 



Example 1. Let predicate ^{x,y) store graph edges and predicate S(x) store 
some nodes. The constraints S — {ai,a2\ with ai :— S(a;), 'Ej{x,y) JL{y,x) 
and Oi2 '■= S(a;), ¥,{x,y) — > 3z E(2/,z), E(z,a;) assert that all nodes in S have a 
cycle of length 1 and 2. It holds that aff(Z') = {E^,E^} and it is easy to verify 
that E is neither safe nor stratified (see Def. 2 in [4]). The minimal restriction 
system for S is G\S):={S ,{{a2,ai)}) with f(ai) := {E1,E2} and f(a2) := 0; in 
particular, ai T^f(ai) "i, 7^/(qi) "2, 012 -<f{a2) "i; and 02 7^/(03) "2 hold. □ 

As shown in [14j , the minimal restriction system is unique and can be computed 
by an NP-algorithm. We are ready to define the notion of safe restriction: 

Definition 7. S is called safely restricted if and only if every strongly connected 
component of its minimal restriction system is safe. □ 

Example 2. Constraint set S from Example [1] is safely restricted: its minimal 
restriction system contains no strongly connected components. □ 

As shown in [14 , safe restriction (a) guarantees chase termination in polynomial 
time data complexity, (b) is strictly more general than stratification, and (c) it 
can be checked by a CONP-algorithm if a set of constraints is safely restricted. 
Inductive Restriction. We now introduce the novel class of inductively re- 
stricted constraints, which generalizes safe restriction but, like the latter, gives 
polynomial-time termination guarantees. We start with a motivating example. 

Example 3. We extend the constraints from Example[T]to S' :— Z'Ujaa}, where 
as := 3x,yS{x),E{x,y). Then G\S'):={E' ,{{ai,a2)Xot2,oii),{oiz,ai)Xoi^,a2)}) 
with f(ai) = f(a2) ■— {E^,E^,S^} and f(Q!3) := is the minimal restriction 
system. It contains the strongly connected component {ai,a2}, which is not 
safe. Consequently, E' is not safely restricted. □ 

Intuitively, safe restriction does not apply in the example above because 013 
"infects" position in the restriction system. Though, null values cannot be 
repeatedly created in S^: as fires at most once, so it does not affect chase termina- 
tion. Our novel termination condition recognizes such situations by recursively 
computing the minimal restriction systems of the strongly connected compo- 
nents. We formalize this computation in Algorithm 1, called part{S). Based on 
this algorithm, we define an improved sufficient termination condition. 



Definition 8. Let be a set of constraints. We call S inductively restricted iff 
for all S' e part{S) it holds that S' is safe. □ 

As stated in the following lemma, inductive restriction strictly generalizes safe 
restriction, but does not increase the complexity of the recognition problem. 

Lemma 1. Let S he a. set of constraints. 

• If is safely restricted, then it is inductively restricted. 

• There is some S that is inductively restricted, but not safely restricted. 

• The recognition problem for inductive restriction is in CONP. □ 

Example 4- Consider Z" from Example[3l It is easy to verify that part(S') ~ 
and we conclude that S' is inductively restricted. As argued in Example [H S' is 
not safely restricted, which proves the second claim in Lemma [TJ □ 

The next theorem gives the main result of this section, showing that inductive 
restriction guarantees chase termination in polynomial time data complexity. 
To the best of our knowledge inductive restriction is the most general sufficient 
termination condition for the chase that has been proposed so far. 

Theorem 1. Let Z be a fixed set of inductively restricted constraints. Then, 
there exists a polynomial Q G N[X] such that for any database instance /, the 
length of every chase sequence is bounded by (5(||/||), where ||/|| is the number 
of distinct values in /. □ 



5 Data-dependent Chase Termination 

Static Termination Guarantees. Motivated by the example in Section [3l we 
now study data-dependent chase termination: given a constraint set E and a 
fixed instance /, does the chase with S terminate on 17 Our first, static scheme 
relies on the observation that the chase will always terminate on instance / if 
the subset of constraints that might fire when chasing / with S is inductively 
restricted. We call a constraint a E S {I, S) -irrelevant iff there is no chase 
sequence / . . . > . . . and formalize our observation in Lemma [5] below. 

Lemma 2. Let S' C S s.t. is a. set of (/, Z')-irrelevant constraints. If Z" 

is inductively restricted, then the chase with S terminates for instance /. □ 

Hence, the crucial point is to effectively compute (/, Z)-irrelevant constraints. 
Unfortunately, one can show that (/, Z')-irrelevance is undecidable in general. 

Theorem 2. Let Z be a set of constraints, a G Z a constraint, and / an 
instance. It is undecidable if a is (/, Z)-irrelevant. □ 

This result prevents us from computing the minimal set of constraints that will 
fire when chasing I. Still, we can give sufficient conditions that guarantee (/, Z)- 
irrelevance for a constraint. We specify such a condition on top of the chase graph 
introduced in [3]. The chase graph for Z is the graph G'(Z) = (Z, where 
a < [3 holds for a, /3 £ Z iff the first three bullets from Def. [4]hold. It was shown 
in that, given Z, the chase graph can be computed by an NP-algorithm. 



Proposition 1. Let / be an instance and be a set of constraints. Further let 
ai :— 3x R{x') where x := IJj^^-,^^^ If the chase graph G{S U {aj}) 

contains no directed path from a/ to /3 £ S, then /3 is (/, Z')-irrelevant. □ 

Proposition [T] combined with Lemma [2] gives us a sufficient data-dependent con- 
dition for chase termination, as iUustrated in the foUowing example. 

Example 5. Consider constraint set U from Fig. [1] and q2 from Section [31 We set 
ai:=3 ci,xi,X2,yi,y2 rail(ci,a;i,?/i), fly(xi, 0:2,2/2), fly(a;2, 2:1,2/2), rail(xi,ci,yi) 
and compute the chase graph G{EL){aj}) :— (Z'U{a/}, {(a/, ai), (as, as)}). By 
Proposition [Tl a2 and are (/, Z')-irrelevant. It holds that U \ {02, 0^3} — {cti} 
is inductively restricted, so we know from Lemma [5] that the chase of q2 with S 
terminates. Similar argumentations hold for q!^ and q!^' from Section [31 □ 

Monitoring Chase Execution. If the previous data-dependent termination 
condition does not apply, we propose to monitor the chase run and abort if 
tuples are created that may potentially lead to non-termination. We introduce 
a data structure called monitor graph that allows us to track the chase run. 

Definition 9. A monitor graph is a tuple {V,E), where V C Anuii x 2P'^^(-^) 
andE CV X U X 2?°^^^) xV. □ 

A node in a monitor graph is a tuple (n, tt), where n is a database value and tt 
the positions in which n was first created (e.g. as null value with the help of some 
TGD). An edge (ni, tti, t/j^, TT, 712, 7r2) between (ni,7ri), (71,2, 7r2) is labeled with 
the constraint ipi that created ^2 smd the set of positions TT from the body of 
ifi in which ni occurred when n2 was created. The monitor graph is successively 
constructed while running the chase, according to the following definition. 

Definition 10. The monitor graph Gs w.r.t. S = Iq ''^^-^^ ^ T^ is a 

monitor graph that is inductively defined as follows 

• Go = (0, 0) is the empty chase segment graph. 

• If i < r and (pi is an EGD then G^+i :— Gi. 

• If i < r and ipi is a TGD then G^+i is obtained from Gi = {Vi, Ei) as follows. 

If the chase step h li+i does not introduce any new null values, then 
Gi+i :— Gi. Otherwise, Vi+i is set as the union of Vi and all pairs (n, tt), 
where n is a newly introduced null value and tt the set of positions in which 
n occurs. Ei+i := Ei U { (ni, tti, (/j^, TT, n2, 7r2) | (ni,7ri) G Vi,{n2,TT2) G 
Vi+i\Vi and TT is the set of positions in body{(pi{ai)) where ni occurs }. □ 

Our next task is to define a necessary criterion for non-termination on top of 
the monitor graph. To this end, we introduce the notion of k-cyclicity. 

Definition 11. Let G = {V,E) be a monitor graph and A; G N. G is called 
fc-cyclic if and only if there are pairwise distinct vi, ...,Vk G V such that 

• there is a path in E that sequentially contains vi to Vk and 

• for aU i G [fc - 1]: P2,3,4,6(wi) = P2,3,4,6(wj+i)- n 



We call a chase sequence k-cyclic if its monitor graph is A:-cyclic. A chase sequence 
may potentially be infinite if some finite prefix is fc-cyclic, for any fc > 1: 

Lemma 3. Let fc £ N. If there is some infinite chase sequence <S when chasing 

Iq with then there is some finite prefix of S that is fc-cyclic. □ 

To avoid non-termination, an application can fix a cycle-depth k and stop the 
chase when this limit is exceeded. For every terminating chase sequence there 
is a fc s.t. the sequence is not fc-cyclic, so if k is chosen large enough the chase 
will succeed. We argue that fc-cyclicity is a natural condition that considers only 
situations that may cause non-termination, so our approach it is preferable to 
blindly chasing the instance and stopping after a fixed amount of time or number 
of chase steps. As justified by the following proposition, the choice of k follows a 
pay-as-you-go principle: for larger fc-values the chase will succeed in more cases. 
We refer the interested reader to the proof of the proposition for an example. 

Proposition 2. For each fc £ N there is some and s.t. (a) both and the 
subset of constraints in that are not (I^, i7/j)-irrelevant are not inductively 
restricted; (b) every chase sequence for 1^ with is (fc — 1)-, but not fc-cyclic. □ 
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APPENDIX 



A Proof of Lemma [T] 

1. Let S be safely restricted. By construction every element Z" e part{S) is 
contained in some strongly connected component Cs' of the minimal restric- 
tion system of S. By assumption Cs' is safe, so every subset of Cs' is also 
safe. Thus, S' is safe. 

2. See Example m 

3. It was shown in [14] that the relation <p (for a set of positions P) can be 
decided by an NP-algorithm. For an input S, the set part(S) can thus be 
computed in non-deterministic polynomial time. To check whether S is not 
inductively restricted, guess some U' G part{S) and verify that it is not safe. 
This implies our claim. 

B Proof of Theorem [T] 

The proof of this theorem is by induction on the depth of the recursive calls 
n during the execution of algorithm part with input E. If n = Q, S is safely 
restricted and it was shown in [14j that the chase terminates in polynomial time 
data complexity for this case. If n > 0, then consider the strongly connected 
components Ci,...,C„ of the minimal restriction system of S. By induction 
hypothesis, chasing with Ci terminates in time (5i(||/||). The rest of this proof is 
analogous to the construction in the induction step from the proof of Theorem 11 
in [13] (showing that the chase for safely restricted constraints terminates in 
polynomial-time data complexity) and therefore is omitted here. 

C Proof of Lemma [2] 

It holds that Z" contains all constraints that may fire during the execution of 
the chase starting with / and S. So, if exists then exists and — . 
If Z" is inductively restricted, then exists, which implies the claim. 

D Proof of Theorem [2] 

It is well-known that the following problem is undecidable: given a Turing ma- 
chine M and and a state transition t from the description of M, does M reach t 
(given the empty string as input)? From (M, t), we will compute a set of TGDs 
and EGDs Sm and a TGD at £ Sm such that the following equivalence holds: 
M reaches t (given the empty string as input) <^ there is a chase sequence in 
the computation of the chase with Sm applied to the empty instance such that 
at will eventually fire. 

Our reduction uses the construction in the proof of Theorem 1 in [4]. To be 
self-contained, we review it here again. We use the signature consisting of the 



relation symbols: T{x, a, y) tape "horizontal" edge from a; to y with symbol a; 
H{x, s,y) head "horizontal" edge from x to y with state s; L{x,y) left "vertical" 
edge; R[x, y) right "vertical" edge; As{x), Bs{x) for every stater transition S, one 
constant for every tape symbol, one constant for every head state, the special 
constant B marking the beginning of the tape and □ to denote an empty tape 
cell. The set of constraints Z'm is as follows. 

1. To set the initial configuration: 

3w, X, y, zT{w, B, x), T{x, □, y), H{x, sq, y),T{y, E, z) 

where □ is the blank symbol and sq is the initial state (both are constants). 

2. For every state transition i5 which moves the head to the right, replacing 
symbol a with a' and going from state s to state s': 

Tix, a, y), H{x, s, y),T{y, b, z) 

3x', y', z'L{x, x'), R{y, y'),R{z, z'),T{x', a', y'), 

T{y'Az'),Hiy',s',z%Asiw'). 

Here a, s, a', b, and s' are constants. 

3. For every state transition d which moves the head to the right past the end 
of the tape replacing symbol a with a' and going from state s to state s': 
T{x, a, y), H{x, s, y),T{y, E, z) -> 

3w', x', y', z'L{x, x'),Riy, y'), R{z, z'), T{x' , a', y'), 
r(y', □, z'),H{y\ s', z'),T{y', E, w'), Asiw'). 
Here a, s,a' ^b, and s' are constants. 

4. Similarly for state transitions which move the head to the left. 

5. Similarly for state transitions which do not move the head. 

6. For every state transition S: 
As{x) Bs{x) 

7. Left copy: 

T{x, a, y), L{y, y') 3x'L{x, x'),T{x', a, y'). 
Here a is a constant. 

8. Right copy: 

T{x, a, y), R{x, x') 3y'T{x', a, y'), R{y, y'). 
Here a is a constant. 

The state transition t is transformed to at in the same way like in bullet six 
above. It is crucial to the proof that every state transition (5 in M is represented 
as a single TGD ^^(a:) Bs{x). The constraint for the initial configuration 
fires exactly once. The computation of the chase with this set of constraint can 
be understood as a grid and each row in the grid represents a configuration of 
the Turing machine. It can be shown that (M, t) is a yes-instance if and only if 
(Z'm, Q^t) is a yes- instance. Thus, the equivalence from above holds. 

E Proof of Proposition [1] 

Assume that f3 is not (/, Z')-irrelevant. Then, there is a chase sequence / 

Ii "l—} . . . . . . . If a/ -< we are finished. Otherwise, there must be 



some rir £ [r] such that q;„^ -< (3 (otherwise (3 could not fire) . If a/ -< q;„,, we are 
finished. Otherwise, there must be some n^-i G [rir — I] such that -< 
(otherwise could not fire). After some finite amount of iterations of this 
process we have that aj -< ^ ... ^ q;„^ -< (3. Therefore, the chase graph 
contains a directed path from aj to (3. 

F Proof of Lemma [3] 

Assume that 

— we have an infinite chase sequence S — {liji^-a and 

— there is some /s G N such that every finite prefix of S is not fc-cyclic. 

Let (5i)ieN be the sequence of finite prefixes of S (such that Si is a chase sequence 
of length i) and let (G5. )igN the respective sequence of monitor graphs. A path 
in a monitor graph is a finite sequence of edges ei, ...,6; (and not of nodes) such 
that P5fi{et) = pi,2(ei+i) for i G [/ - 1]. 

We define the notion of depth of a node in a monitor graph. Let w be a node 
in Gs- and pred{v) the set of predecessors of v. In case v has no predecessors, 
the depth of v, depthQ^ (v), is defined as zero. In case v has predecessors, then 
deptha^ [v) :— \ + max{ depthcs (w) | w G predlv) }. 

The following claim follows immediately from the definition of the monitor graph. 
The formal proof is left to the reader. 

Proposition 3. Let w be a node in and j > i. 

• Gsi is an acyclic labeled tree. 

• Every null value that appears in li appears in some first position of a node 
in Gs- . 

• There is a homomorphisrr0 hij from G5. to Gsj such that depthcs iv) < 
depthGs^{htj{v)). 

• If li '^i^' li+i, 6 G fti is a null value and c a null value that was newly created 
in this step, then the depth of any node in Gsi^i in which h appears is strictly 
smaller than the depth of any node in Gsi^i in which c appears. (Proof by 
induction on i) □ 

The next proposition is the most important step in the proof of this lemma and 
follows directly from bullet four in Proposition [31 

Proposition 4. Let i G N. For every d G NU {0} there is a number fc^ G N such 
that for every i G N it holds that \{v \ depthcs {v) < d }\ < kd- Note that kd is 
independent from i. (Proof by induction on d) □ 

We observe another fact. 



A homomorphism leaves relational symbols and constraints untouched, i.e. is the 
identity on elements from A. 



Proposition 5. There is some pk £ N such that if some has a path of 
length Pk, then Si is fc-cychc. □ 



This is because we have only a bounded number of relational symbols and con- 
straints available. The remaining step in the proof is to show that if we choose i 
large enough, then Gs- contains a path of length pk- Assume that this claim does 
not hold. By Proposition [4l the number of nodes of a certain depth is bounded 
(independent of i). So, if for any i there would be no path of length pk in Gg. , 
then the number of nodes in G5. would be bounded (independent of i). This im- 
plies that the chase has introduced only a bounded number of fresh null values, 
which contradicts the assumption of an infinite chase sequence. 

G Proof of Proposition [2] 

We set Ik := {S{ci), S{ck), Rkici, Cfc)} and 

Sk := {(ys}, where ip := S{xk), Rk{xi, ...,Xk) 3yRk{y,xi, ...,Xk-i). 

First observe that Sk contains no (/, Z'fc)-irrelevant constraints, so the subset 
of the constraints in Sk that is not (/, i7)-irrelevant equals to Sk- It is easy to 
verify that Ek is not inductively restricted, although the chase with Uk always 
terminates, independent of the underlying data instance, so condition (a) holds. 
We now chase of Ik with Sk- There is only one possible chase sequence ( Ji)o<i<fc, 
defined as Jq :— Ik, for i < k: Ji ;~ Ji_i U {R{ni, ...,ni,ci, ...,Ck-i)}, and 
rii, ...,n/c are fresh null values. It holds that Jk \= Sk- 

The monitor graph w.r.t. ( Ji)o<i<fe is {V, E), where E := { {ni,R\) | i e [fc] } and 
V := { {ni,R\, ip, F(?k^, nj,Rl) \ l<i<j<k}. We observe that the sequence 
is (/c - l)-cyclic because {n-i_,R].,ip,R\,n2,R\), ---,{nk-i,R\,'p,R\,nk,R\) con- 
stitute a path in the chase graph that satisfies the conditions of the definition of 
{k — l)-cyclicity. The chase sequence is not fc-cyclic because there is no path of 
length at least k in the monitor graph. This proves part (b) of the proposition. 



